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Abstract 

A binocular system developed by the author in terms of projective 
Fourier transform (PFT) of the conformal camera, which numerically in- 
tegrates the head, eyes, and visual cortex, is used to process visual infor- 
mation during saccadic eye movements. Although we make three saccades 
per second at the eyeball's maximum speed of 700 deg/sec, our visual sys- 
tem accounts for these incisive eye movements to produce a stable percept 
of the world. This visual constancy is maintained by neuronal receptive 
field shifts in various rctinotopically organized cortical areas prior to sac- 
cade onset, giving the brain access to visual information from the saccado's 
target before the eyes' arrival. It integrates visual information acquisition 
across saccades. Our modeling utilizes basic properties of PFT. First, PFT 
is computable by FFT in complex logarithmic coordinates that approxi- 
mate the rctinotopy. Second, a translation in rctinotopic (logarithmic) co- 
ordinates, modeled by the shift property of the Fourier transform, remaps 
the presaccadic scene into a postsaccadic reference frame. It also accounts 
for the perisaccadic mislocalization observed by human subjects in labo- 
ratory experiments. Because our modeling involves cross-disciplinary ar- 
eas of conformal geometry, abstract and computational harmonic analysis, 
computational vision, and visual neuroscience, we include the correspond- 
ing background material and elucidate how these different areas interwove 
in our modeling of primate perception. In particular, we present the phys- 
iological and behavioral facts underlying the neural processes related to 
our modeling. We also emphasize the conformal camera's geometry and 
discuss how it is uniquely useful in the intermediate-level vision compu- 
tational aspects of natural scene understanding. 

Keywords: The conformal camera, projective Fourier transform, com- 
plex projective geometry, intermediate-level vision, retinotopy, binocular 
vision, saccades, efference copy, predictive remapping, perisaccadic mislo- 
calization 



1 Introduction 



In the last few years, we have developed projective Fourier analysis for compu- 
tational vision in the framework of the representation theory of the semisimple 
Lie group SL(2,C) [Ml IMl EHl EU [62]. It was done by restricting the group 
representations to the image plane of the conformal camera — the camera with 
image projective transformations given by the action of SL(2, C). This analysis 
provides an efficient image representation and processing that are not only well 
adapted to the projective transformations of retinal images, but are also to the 
retinotopic mappings of the brain's oculomotor and visual pathways. This lat- 
ter assertion stems from the fact that the projective Fourier transform (PFT) is 
computable by a fast Fourier transform algorithm (FFT) in coordinates given 
by a complex logarithm that transforms PFT into the standard Fourier integral 
and at the same time approximates the retinotopic mappings [54 . 

However, the conformal camera is somewhat abstract and noticeably dif- 
ferent than any other camera model used in computer vision. Nevertheless, its 
remarkable advantages are revealed to us every time we model specific physiolog- 
ical processes involved in visual perception. For instance, one could reasonably 
expect that a stationary camera and a moving object is similar to a moving 
camera and a stationary because the relative position of the camera and the 
object could be the same in both cases. Remarkably, it fails in primate vi- 
sion systems. In fact, when the image of a fast-moving object sweeps across 
a static retina, though we are normally aware of its motion, we fail to detect 
the comparable motion of images as they sweep across the retina during fast 
eye movements. Computational modeling presented in this article demonstrates 
that the conformal camera naturally supports this asymmetry. 

Recently, building on projective Fourier analysis of the conformal camera, a 
mathematical model integrating the head, eyes, and visual cortex into a single 
computational binocular system was introduced in [63, with particular focus on 
stereopsis. Here it is demonstrated that this integrated system may efficiently 
process visual information during fast scanning eye movements called saccades, 
employed to build up understanding of a scene despite the highest acuity only 
present in the central foveal region of a 2 deg visual angle. We make about three 
saccades per second at the eyeball's maximum speed of 700 deg/sec. Visual 
sensitivity is markedly reduced during saccades as we do not see moving retinal 
images. These fragmented pieces of visual information are sent to the cortical 
areas, with a minor part going to subcortical areas where they are integrated 
into a stable coherent percept of a 3D world despite of the persistance of incisive 
eyes movements. This constancy of vision is maintained by a widespread neural 
network with multiple mechanisms receiving inputs from several sources. Not 
surprisingly, in spite of a significant recent progress, how this problem is solved 
by the brain has been the topic of many theories, see |69| for a recent review. 

The modeling presented in this article, first proposed in [BT, utilizes basic 
properties of PFT to capture some of the very first computational aspects of 
the neural processes during the saccadic eye movements. First, because the 
PFT of an image can be efficiently computed by FFT in complex logarithmic 
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coordinates that also approximate the retinotopy, the output from the inverse 
PFT resembles the cortical representation of the image. Second, a simple trans- 
lation in retinotopic (logarithmic) coordinates that is efficiently modeled here 
by the standard shift property of the inverse PFT when expressed in these co- 
ordinates, remaps the presaccadic scene in the reference frame centered on the 
fovea into a postsaccadic reference frame centered on the impending saccade 
target. Equivalently, it uniformly shifts images around the target in cortical 
periphery to the cortical foveal location. Moreover, this shift that takes place 
in retinotopic (logarithmic) coordinates accounts for perceptual space compres- 
sion seen around the time of saccadic eye movements by human subjects in 
psychophysical laboratory experiments [331 ES] . 

The idea of remapping is supported by the fact that the neural correlates 
of a copy of the oculomotor command to move eyes, known as efference copy 
or corollary discharge |24] , have been found in the form of a neuronal receptive 
field shift about 50 ms before a saccade onset in various retinotopically organized 
visual cortical areas [TH] [37] . This shift points to the possibility that prior to the 
eyes arriving at the target, the brain has access to visual information from that 
peripheral region. In fact, in the recent experiment [3T], when human subjects 
shifted fixation to the clock, their reported time was earlier than the actual 
time on the clock by about 40 ms. It may integrate visual information from 
an object across saccades, and therefore, eliminate the need for starting visual 
information processing anew three times per second at each fixation and speed 
up a costly process of visual information acquisition . It may also build up 
perceptual continuity across fixations |45j . 

The conformal camera was initially constructed for the purpose of developing 
projectively adapted image representation in the framework of the only well un- 
derstood 'projective' Fourier analysis formulated as a direction in the represen- 
tation theory of semisimple Lie groups, a great achievement of the 20th-century 
mathematics [29] . In the case of the conformal camera, it is the representation 
theory of the group SL(2,C), the group generating image projective transfor- 
mations in a conformal geometry setting; see f61J where a brief introduction to 
the group representations is also given. When writing this article, it became 
apparent that we should carefully set a stage for our modeling that involves 
conformal projective geometry, abstract and computational harmonic analysis, 
image processing, and computational vision including visual neuroscience and 
machine vision. Thus, the overarching aim of this article is to elucidate how 
these cross-disciplinary areas interwove in our modeling of primate perception. 

To this end, the paper is organized as follows. In the next section, we in- 
clude, in some detail, physiological and behavioral facts that underlie the neural 
processes of human vision related to our computational modeling. In the follow- 
ing three sections, we lay down the background that explains the mathematical 
tools we use in modeling human vision processes. In Section 3, we introduce the 
conformal camera and discuss the image projective transformations. We end 
this section with the construction of the group of image projective transforma- 
tions in the conformal camera. In Section 4, we review the geometry underlying 
the conformal camera and demonstrate that the fundamental properties of this 
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geometry should be uniquely useful in the early- and intermediate-level vision 
computational aspects of natural scene understanding. In the last of these three 
sections, Section 5, we show that the conformal camera possesses its own har- 
monic analysis — projective Fourier analysis — which gives efficient image repre- 
sentation well adapted to both the retinal image projective transformations and 
the retinotopy of the brain's visual pathways. Finally, in Section 6, we discuss 
some implementation issues when working with the discrete PFT. In particular, 
the binocular system with head, eyes, and visual cortex numerically integrated 
by PFT is discussed. Further, using this integrated binocular system, we model 
the perisaccadic perception, including the perisaccadic mislocalizations observed 
in psychophysical laboratory experiments. This perisaccadic mislocalization, in 
the form of perceptual space compression around the saccade target, is simulated 
in the model by the standard shift property of Fourier transform. Also, the fu- 
ture direction in advancing our modeling and its implementation are discussed. 
The paper is summarized in the last section. 

The research program presented here advances our mathematical modeling 
intended for computational vision, including visual neuroscience and machine 
vision systems. It is guided by a strategy important in the contemporary neu- 
rocomputing research: linking known anatomical and physiological details with 
efficient computational modeling and engineering designs should be vital not 
only to the emerging field of neural engineering but also to interpreting relevant 
neurophysiological data. 

2 Visual Neuroscience Background 

2.1 Visual Perception is a Creative Process 

When light reflected from objects in the 3D world is impinged upon the retina, 
it activates the neuronal pathways, beginning with phototransduction by about 
125 million photoreceptors. Next, the visual information passes through a multi- 
layered circuitry of the retina where substantial processing takes place. 

The only recently emerging picture [20 of the retinal processing tells us that 
more than a dozen of distinct visual recordings of the retinal image are extracted. 
For example, one recording emphasizes the boundaries between objects while 
another carries information about movement in specific directions. The result 
is that more than a dozen of the most essential features of the original retinal 
image are extracted in parallel and sent to the brain as a train of spikes along 
about 1.5 million axons of ganglion cells to more than 30 association cortex 
areas containing about 30 billion neurons where the details: depth, texture, 
color, form, motion, etc., are added and integrated into a coherent view of the 
3D world. This integration is entirely dependent upon visual experience; almost 
all higher order features of vision are influenced by expectations based on past 
experience. Although such influences occasionally allow the brain to be fooled 
into misperception, as is the case with the optical illusion in Fig.l, they also 
give us the ability to see and respond to the visual world quickly. 
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Figure 1: This illusion created by Adelson illustrates how perception may reflect 
the complex properties of the environment. 



We see from this very brief description that visual perception is a creative 
process and, for this one reason alone, its quantitative modeling must be ex- 
tremely difficult. Therefore, we try to develop a model that captures only some 
of the very first computational aspects of visual perception that takes place the 
first seconds following the opening of our eyes in daylight. Even with this lim- 
ited goal, we find that those aspects are controlled by extremely sophisticated 
neural processes that involve nearly every level of the brain. 

2.2 Early Visual Pathways 

When humans open eyes in daylight and direct their gazes to attend a scene, 
they only see with the highest clarity, the central part of about a visual angle 
of 2 deg. This region is projected onto the central fovea where its image is 
sampled by the hexagonal mosaic of photoreceptors consisting of mainly cone 
cells that are color-selective type of photoreceptors for a sharp daylight vision. 
The visual acuity decreases rapidly away from the fovea because the distance 
between cones increases with eccentricity as they are outnumbered by rode cells, 
photoreceptors for a low acuity black-and-white night vision. Moreover, there is 
a gradual loss of hexagonal regularity of the photoreceptor mosaic. For example, 
at 2.5 deg radius, which corresponds to the most visually useful region of the 
retina, acuity drops 50%. 

The distribution of axons in the optic nerve, which carries the retinal pro- 
cessing output to the brain, is precisely organized, but varies along the visual 
pathways. One aspect of this organization, or the retinotopy, is that axons 
corresponding to neighboring places in the retina are positioned closely in the 
nerve bundle, with notable exception along the vertical meridian. This excep- 
tion stems from the fact that the output of each eye splits along the retinal 
vertical meridian when the axons originating from the nasal half of the retina 
cross at the optic chiasm to the contralateral brain's hemisphere and join the 
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temporal half, which remains on the same side of its eye-of-origin. This splitting 
and crossing re-organizes the retina outputs so that the left hemisphere desti- 
nations receive information from the right visual field, and the right hemisphere 
destinations receive information from the left visual field. According to the split 
theory |39l 143] , which provides a greater understanding of vision cognitive pro- 
cesses than the bilateral theory of overlapping projections, there is a sharp foveal 
split along the vertical meridian of hemispherical cortical projections. Although 
it is crucial for synthesizing 3D representation from the binocular disparities in 
the pair of 2D retinal images, it presents a challenge in modeling retino-cortical 
image processing across visual hemifields. 

2.3 Beyond Early Visual Pathways: Visuo-Saccadic Per- 
ception 

One of the most important functions of any nervous system is sensing the ex- 
ternal environment and responding in a way that maximizes immediate survival 
chances. For this reason, the perception and action have evolved in mammals by 
supporting each other's functions. This functional link between visual percep- 
tion and oculomotor action is well demonstrated in primates when they execute 
the eye-scanning movements (saccades) in order to overcome the eye's acuity 
limitation in building up the scene understanding (see Fig. 2). 

The saccadic eye movement is the most common bodily movement since 
we make about three saccades per second at the eyeball's maximum speed of 
700 deg/sec. The eyes remain relatively still (while undergoing tremors, drifts 
and microsaccades — a miniature, random eye movement important for proper 
functioning of eyes [IJ) between consecutive saccades for about 180-320 ms, 
depending on the task performed. During this time period, the image is pro- 
cessed by the retinal circuitry and sent mainly to the visual cortex (starting with 
the primary visual cortex, or VI, and reaching higher cortical areas, including 
cognitive areas) with a minor part going to oculomotor midbrain areas. 

The sequence of saccades, fixations, and, often, also smooth-pursuit eye 
movements for tracking a slowly moving small object in the scene, is called the 
scanpath, first studied in In Fig. 2, (b) shows a progressively blurred 

image from (a), simulating the progressive loss of acuity with eccentricity. In 
Fig. 2 (c) we depict the scanpath that eyes might actually take to build up 
understanding of the scene. 

Although they are the simplest of bodily movements, the eyes' saccades are 
controlled by widespread neural network that involves nearly every level of the 
brain. Most prominently, it includes the superior coUiculus (SC) of the midbrain 
for representing possible saccade targets, the parietal eye field (PEF) and frontal 
eye field (FEF) in the parietal and frontal lobes of the neocortex (which obtain 
inputs from many visual cortical areas) for assisting the SC in the control of 
the involuntary (PEF) and voluntary (FEF) saccades. They also project to the 
simple neural circuits in the brainstem reticular formation in the midbrain that 
ensure the saccade's outstanding speed and precision. 
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Figure 2: (a) San Diego skyline and harbor, (b) Progressively blurred image 
from (a) simulating the progressive loss of retinal acuity with eccentricity. The 
circle Ci encloses the part of the scene projected onto the high acuity fovea of 
a 2 deg diameter. The circle C2 encloses the part projected onto the visually 
useful faveal region of a 5 deg diameter, (c) A scanning path the eyes may take 
to build the scene understanding. Adapted from [5]. 



Remarkably, many of the neural processes involved in saccade generation and 
control are amenable to precise quantitative studies such that even questions 
regarding the operation of the whole structure can be addressed by building on 
the existing models |21j . This not only carries immense clinical significance |15j . 
but also forms an essential preliminary stage in building our understanding of 
human vision, the knowledge that will eventually be transferred to the emerging 
field of neural engineering. 

Nevertheless, some neural processes of the visuo-saccadic system remain vir- 
tually unknown. Visual sensitivity is markedly reduced during saccadic move- 
ments as we do not see moving images on the retinas. This barely understood 
neural process is known as saccadic suppression. There is accumulating evidence 
that viewers integrate information poorly across fixations during tasks such as 
reading, visual search, and scene perception [50]. It means that, three times per 
second, there are instant large changes in the retinal images without almost any 
information consciously carried between images. Furthermore, because the next 
saccade target selection for the voluntary saccades takes place in the higher cor- 
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tical areas involving cognitive processes [25] , the time needed for the oculomotor 
system to plan and execute the saccadic eye movement could take as long as 150 
ms. Therefore, it is critical that visual information is efficiently acquired during 
each fixation period of about 300 ms without repeating much of the whole pro- 
cess at each fixation since it would require too much computational resources. 
However, visual constancy, the fact that we are not aware of any discontinuity in 
the scene perception when executing the scanpath, is not perfect. About 50 ms 
before the onset of the saccade, during saccadic movement (~ 30 ms) and about 
50 ms after the saccade, perceptual space is transiently compressed around the 
saccade target [311 [S3] , a phenomenon called perisaccadic mislocalization. We 
continue this discussion in Section 6.5 where we present our modeling of the 
perisacccadic perception based on projective Fourier transform of the conformal 
camera. 



3 The Conformal Camera 

We model the human eyes' imaging functions with the conformal camera, the 
name of which will be explained later. The camera has many remarkable prop- 
erties, the first following directly from its construction: the group of image 
projective transformations in the conformal camera is generated internally and 
has the 'minimal' property as explained in Fig. 3. In the remaining pages of 
this article, the other properties will be carefully examined in their relation to 
many computational aspects of visual perception. 



Figure 3: (a) Image projective transformations are generated by iterations of 
transformations covering translations 'ft.' and rotations 'fc' of planar objects in 
the scene, (b) The 2D section of the conformal camera further explains how 
image projective transformations are generated and how the projective degrees 
of freedom are reduced in the camera; one image projective transformation in 
the conformal camera corresponds to different planar objects translations and 
rotations in the 3D world. 



In the conformal camera, the retina is represented by the image plane X2 = 1 
with complex coordinates + ixi, on which a 3D scene is projected under the 
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mapping 

j{xi,X2,X3) = {x3 + ixi) /X2. (1) 

The imphcit assumption X2 will be removed later. Next, we give the precise 
form of the 'fc' and '/i' image transformations introduced in Fig. 3. 

3.1 Basic Image Transformations 

The image projective transformations in the conformal camera are generated by 
the following two transformations: (1) an image is projected by ^ils^, ^ 
into the unit sphere ^ centered at (0, 1,0), then the sphere is rotated and 
the (rotated) image is projected by j back to the image plane, (2) the image is 
translated out of the image plane then projected by j back to the image plane. 
The (1) and (2) transformations result in the '/c' and '/i' mappings in Fig. 3, 
respectively. They are explicitly given as follows: 

1. k transformations: SU(2) = |(_^^ '^)| is the maximal compact sub- 
group in SL(2,C), the group of 2 x 2 complex matrices of determinant 1. We 
let the group SO (3) of three dimensional rotations act on the sphere S'^q ^ 
by rotating it about (0, 1, 0). Furthermore, we parametrize SO(3) by the Euler 
angles {ip, (j), ip'), where ip is the rotation about the a;2-axis, followed by the ro- 
tation (j) about the t/s-axis, whichis parallel to the a;3-axis and passes through 
(0, 1,0), and finally by the rotation ■ip' about the rotated a;2-axis. Then, to each 
R{ip,(l),tl)') in SO(3) there correspond two elements in SU(2), 

e^(V'+'A')/2 cos f ^e«(V'-V'')/2 | \ 
k[^, <p,^)-±^ ie-iW-V'O/a sin f e-'^^+^'y^ cos f J' ^> 

such that j o R{tp, </>, tp') o ^jIs^^, ^ (z) = k ■ z are given by the following 

linear fractional mappings 

(e-»(V'+V'')/2 cos I) ^ + ie'(^-V'')/2 sin I 

k(4>, 6, 4)')-z = ^ — ^. (3) 

^ ' (jeiW-^')/2 sin f ) z + e'(V'+^')/2 cos | ^ ' 

2. /i transformations: Similarly, for each translation vector b =(&i,&2,^'3) 
where 62 7^ —1 acting on the image plane T-j^{x) = x+ b , there are two elements 
SL(2,C), 

hib,b2,b3) = ±( + \ 

^ ' ' ^ \{b3 + ib,){l + b2)-'^^ (1 + 62)-'/'^ ^' 

such that j o T-^ o (z) = h ■ z are given by the corresponding linear 

fractional mappings by the same action as before, 

(1 + 62) ' 
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Now, if f{z) is an image intensity function and g is either k or h mapping, 
the corresponding image transformation is the following: f{g~^ ■ z). 

Both k ■ z and h ■ z mappings have forms of special linear-fractional trans- 
formations 

g • g = , c ; aJ- 7/3=1. 

^z + d 

These mappings are conformal, that is, they preserve the oriented angles of 
two tangent vectors 2^(io) to any two curves Zk{t) (fc — 1,2) intersecting at the 
point q = z{to). In fact, 

d fazk{t) + p\ z',{to) e*^('')4(^o). ^^^3 (6) 



dtyjZk{t)+Sj,^,^ hq + Sr \{iq + SW 
and both vectors ^1(^0) and ^2(^0) Sixe rotated by the same angle xil) 



3.2 The Group of Image Projective Transformations 
3.2.1 The PSL(2,C) Group 

The group of image transformations in the conformal camera is generated by 
all finite iterations of k and h mappings. To derive this group, we recall that 
k e SU(2) and note that h e AN C SL(2,C) if 1 &2 > and /i = eAN C 
SL(2,C) if 1 + 62 < 0, where 




Now, it follows from the polar decomposition SL(2,C) = SU(2)ASU(2), that 
all these finite iterations result in the group SL(2, C) acting by linear-fractional 
mappings 

, . /a 6\ dz + c r ^ \ fn\ 

SL(2,L)9 -z^- — - — ; z ^ X3 + ixi = {xi,l,X3). (8) 
\c d J oz + a 

Because ±(" ^) have the same action, we need to identify matrices in 
SL(2,C) that difi^er in sign. The result is the quotient group PSL(2,C) = 
SL(2,C)/{±/(i}, where Id is the identity matrix, and the action ^ establishes 
a group isomorphism between linear-fractional mappings and PSL(2, C). Thus, 

PSL(2.C,3.^(:^)^/fa-.=)./(^) (9) 

gives the image projective transformations of the intensity function f{z). 



3.2.2 Conformality 

As we showed in ([6|, the mappings in (|8| are conformal. Because of this prop- 
erty, the camera is called 'conformal'. Although, the conformal part of an image 
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projective transformation can be removed with almost no computational cost, 
leaving only a perspective transformation of the image (see [SI1IS2]); the con- 
formality provides an advantage in imaging because the conformal mappings 
rotate and dilate the image infinitesimal neighborhoods, and, therefore, locally 
preserve the image 'pixels'. 

To complete the description of the conformal camera, we need to address 
some implicit assumptions, such as the restriction ~bz + d =/= in ^ we have 
frequently made in this section. 



4 Geometry of the Conformal Camera 

In the homogeneous coordinate framework of projective geometry |6J, the con- 
formal camera is embedded into the complex plane 



Zi — X2 + iy, Z2 = X3 + ixi 



In this embedding, the 'slopes' ^ of the complex lines Z2 = ^^i are numerically 
identified with the points on the extended image plane C = C U {00) where 00 
corresponds to the line zi = 0. We note that \i X2 ^ Q and y = 0, the slope 
^ corresponds to the point x^ + ixi at which the ray (line) in that passes 
through the origin is intersecting the image plane of the conformal camera. 
Now, the standard action of the group SL(2,C) on nonzero column vectors 

a b \ / zi \ f azi + bz2 
C d ) \ Z2 ) \ czi + dz2 

implies that the slope ^ = |^ is mapped to the slope 

^, Z2 czi + dz2 c + d^ 
z[ azi + bz2 0, + b^ 

agreeing with the linear fractional mappings in ([s]). 

However, the action must be extended to include the line zi = of 'slope' 
00 as follows: 

-00 = ^/6, ■ {-a/b) = 00. 

The stereographic projection a = j\g2^ ^ (with j in (jl|) maps S'^^ ^ bijectively 

onto C and (t(0, 0, 0) = 00 gives a concrete meaning to the point 00 such that 
it can be treated as any other point of C. Thus, geometry of the image plane 
C of the conformal camera with the image projective transformations given by 
the group PSL(2, C) acting by linear-fractional transformations can be dually 
described as follows: ^ 

1. C is the complex projective line, i.e., C = (C) where 

(C) = {complex lines in through the origin} 
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with the group of projective transformations PSL(2,C). Thus, the image pro- 
jective transformations acting on the points of the extended image plane (or 
simply, the image plane) of the conformal camera can be identified with pro- 
jective geometry (containing Euclidean geometry as a sub-geometry) of the 
one-dimensional complex line [B]. 

2. C is the Riemann sphere since under stereographic projection a = j Is^^ ^ 

we have the isomorphism C ^ gj. The group PSL(2, C) acting on C consists 

of the bijective meromorphic mappings of C |32j . Thus, it is the group of 
holomorphic automorphisms of the Riemann sphere that preserve the intrinsic 
geometry imposed by complex structure, known as Mobius geometry [27] or 
inversive geometry |13j . 

What we have just described shows the following fundamental property: pro- 
jective geometry underlying the conformal camera, also called Mobius or inver- 
sive geometry, and holomorphic complex structure that provides the framework 
for the development of complex numerical analysis, are in fact two faces — one 
'geometric' and the other 'numerical' — of the same coin. We stress that the 
real projective geometry underlying the pinhole camera and usually employed 
in computer vision [49' '55' does not possess this fundamental property which 
sets apart our modeling of primate visual perception from other approaches. 

4.1 The Conformal Camera and Visual Perception 

The image plane of the conformal camera does not admit a distance that is 
invariant under image projective (that is, linear-fractional) transformations. 
Therefore, geometry of the conformal camera does not possess a Riemannian 
metric; for instance, there is no curvature measure. As customary in complex 
projective (Mobius or inversive) geometry, we consider a line as a circle passing 
through the point cxd. Then, the fundamental property of this geometry can be 
expressed as follows: linear-fractional mappings take circles to circles. Thus, 
circles can play the role of geodesies. Moreover, each circle carries a signature 
of curvature — the inverse of the radius. We showed before that linear-fractional 
mappings are conformal; we add here for completeness that stereographic pro- 
jection (T = JI52 ^ is also conformal and maps circles in the sphere S'^^q ^ onto 

circles in C. In conclusion, circles play a crucial role in the conformal camera 
geometry and it should be reflected in psychological and computational aspects 
of natural scene understanding if this camera is relevant to modeling primate 
visual perception. 

Neurophysiological experiments demonstrate that the retina performs filter- 
ing of impinged images that extract local contrast spatially and temporally. For 
instance, center surround cells at the retinal processing stage are triggered by 
local spatial changes in intensity referred to as edges or contours. This filter- 
ing is enhanced in the primary visual cortex, the first cortical area receiving, 
via LGN, the retinal output, which itself is a case study in dense packing of 
overlapping visual submodalities: motion, orientation, frequency (color), and 
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oculomotor dominance (depth). In psychological tests, humans easily detect 
a significant change in spatial intensity (low-level vision), and effortlessly and 
unambiguously group this usually fragmented visual information (contours of 
occluded objects, for example), into coherent, global shapes (intermediate-level 
vision) . Considering its computational complexity, it is one of the most difficult 
problems that primate visual system has to solve [55] , 

The Gestalt phenomenology and quantitative psychological measurements 
established the rules, summarized in the ideas of good continuation [35l [68] and 
association field [19] , that determine interactions between fragmented edges such 
that they extend along continuous contours joining them in the way they will 
normally be grouped together to faithfully represent a scene. Evidence accumu- 
lated in psychological and physiological studies suggests that the human visual 
system utilizes a local grouping process (association field) with two very simple 
rules: coUinearity and co-circularity with underlying scale invariant statistics 
for both geometric arrangements in natural scenes. These rules were confirmed 
in [551112] by statistical analysis of natural scenes. Two basic intermediate-level 
descriptors that the brain employs in grouping elements into global objects are 
the medial axis transformation [TD] , or symmetry structure [ID] [H] , and the 
curvature extrema [3l[30]. In fact, the medial axis, which visual system extracts 
as a skeletal (intermediate- level) representation of objects [35], can be defined 
as the set of the centers of maximal circles inscribed inside the contour. The 
curvatures at the corresponding points of a contour are given by the inverse 
radii of the circles. 

From the above discussion we see that, on one hand, co-circularity and scale 
invariance emerge as the most basic concepts used by intermediate-level vision in 
solving the difficult problems of grouping local elements into individual objects 
of natural scenes. On the other hand, the non-metric projective geometry of the 
conformal camera that models eye imaging functions can be entirely constructed 
from circles such that co-circularity is preserved by projective transformations. 
Thus, it seems that the conformal camera would be very useful in modeling eye's 
imaging functions related to the lower and intermediate-level natural vision. 

Other characteristics of the conformal camera that are uniquely useful in 
modehng primate visual perception are discussed in the remaining part of this 
article. Next, we briefly review the unity of geometry and numerical methods 
by showing that the conformal camera has its own projective Fourier transform 
(PFT). 

5 Projective Fourier Analysis 

The projective Fourier analysis has been constructed by restricting geometric 
Fourier analysis of SL(2,C) — a direction in the representation theory of the 
semisimple Lie groups |33j — to the image plane of the conformal camera (see 
Section 5.1 in [55]). The resulting projective Fourier transform (PFT) of a given 
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image intensity function /(z) G L^(C) is the following 



f{s,k)^y ,f{z)\zr^-'(^^^ ' dzdz (10) 

where (s, fc) G M x Z and if z = 0:3 + ixi, then ^dzdz = dx^dxi is the Euclidean 
measure on the image plane. In this work, we consider only the noncompact 
picture of PFT and, for a complete mathematical account, that includes also 
the compact picture we refer to |62]. The noncompact and compact pictures 
in the case of Euclidean group correspond to the classical and spherical Fourier 
analyses, respectively (Section 3 in |61j). In the next remark we justify the name 
'projective Fourier transform', and, for comprehensive account, we refer to [52] , 

Remark 1 The functions Hs^kiz) — l^l*" (jf|) ' ; s £ R, k £ Z are all one di- 
mensional unitary representations of the Borel subgroup B = MAN o/SL(2, C), 



and they play in {10) the role complex exponentials play in the classical Fourier 
transform. These one dimensional representations are all finite unitary rep- 
resentations of the Borel subgroup B, as opposed to the fact that all nontrivial 
unitary representations of SL(2, C) are infinite. Furthermore, the group B 'ex- 
hausts' the projective group SL(2,C) by Gauss decomposition SL(2,C)=NB, 
where = ' means that the equality holds up to lower dimensional subset, that is, 
almost everywhere, and N in ^ represents Euclidean translations. 

In log-polar coordinates (u, 9) given by Inre'^ — Inr i9 — u -\- i9, f{k, s) 
has the form of the standard Fourier integral 

f{s,k)^ f //(e"+^^)e"e~'(""+^^')dud0, (11) 



where we used ^dzdz — e^^dud9. We see that a function / that is integrable 
on C* = C\{0}, has finite PFT, 

f{s,k) < / /(e"+*'^)e"dMd0 = / / f{re'^)drd9 <oo. (12) 

■/o J-oo Jo Jo 

Therefore, this / can be extended to C by /(O) = 0. Thus, in spite of the 
logarithmic singularity of log-polar coordinates, the projective Fourier transform 
of integrable functions on C is finite. This observation will be crucial when we 
discretize the PFT in the next section. 

Inverting (111, which is done in the (u,9)-spa,ce, we get 

e-fiu,9)^-^ f2 j Ks,k)e^^-^+"'Us, (13) 

^ ^ k— — oo 

where i{u,9) = /(e"+*''). We stress that although /(e'"+*'') and i{u,e) are 
numerically equal, they are given on different spaces; /(e"+'^) is on the image 
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plane in polar coordinates and f(u, 9) is on the space defined by rectangular 
(u, ^)-coordinates. 

Finally, by expressing ( 13 1 in the z- variable, we obtain the inverse projective 
Fourier transform 

/w = ^ Ej/(«.m=r-(j^) (14) 

5.1 Discrete Projective Fourier Transform 

To discretize the PFT we use the fact that /(s, k) is finite for an integrable 
function /, see ([l2|. By removing a disk \z\ < Va, we can assume that the 
support of f(w, 6) is contained within (In Tq, In r?,) x [0, 2tt). We approximate the 
integral in ( 1 1 1 by a double Riemann sum 

M-l N-l 



f{27rm/T,n) ~ /(e"'=e^''Oe^'"('"'=/*^+"'/^) 



k=0 1=0 

with M X N partition points 

iuk,ei) = {lnra + kT/M,2T:l/N);0 < k < M-1,0 < 1<N-1, T = ln(rh/r<j). 

(15) 

Then, introducing 

fk,i = (2^T/AfJV)/(e"'=e^«0 and f^,, = {27rT/MN)i{uk,ei) (16) 
and defining by 

Af-l N-l 

Un -EE A,,e"'=e-*2.™fc/Mg-^2.„//Ar^ ^^^^ 

k=0 1=0 

we obtain 

A/-lAf-l 

f^.' = E E 7™,ne-"^e^2-'=/*^e^2'^"'/^. (18) 

m— n— 

We note that fm.n ~ /(27rm/T, n) and refer to [28] for a discussion of 
numerical aspects on the approximation. Both expressions ( 17 1 and ( 18 ) can 
be computed efficiently by FFT algorithms since the exponents are taken at 
equidistant points. See simulation for a bar pattern in Fi g. 4 . 

Finally, on introducing Zkj — e"''^'^' into (17 1 and (18 1, we arrive at the 
(M, A^)-point discrete projective Fourier transform (DPFT) and its inverse: 

M-lN-l 
k=0 1=0 



/".«=EE/M(gi) M'-'-^^^' (19) 
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Figure 4: Exp-polar sampling (the distance between circles partially displayed in 
the first quadrant changes exponentially) of a bar pattern on the retina is shown 
on the left. The bar pattern in the cortical space rendered by the inverse DPFT 
computed with FFT is shown on the right. The cortical uniform sampling grid, 
which is obtained by applying complex logarithm to the exp-polar grid in (a), 
is shown only in the upper left corner. 



and 

A/-1 N-1 



ni—Q n— ' 

now with fk^i ~ {2TTT/AIN)f{zk i). The projectively adapted characteristics of 
the discrete projective Fourier analysis can be expressed as follows: 

^ M-lN-l ^ / z' \ " 

^ MTV ^ ^ l^fe.'l*^'""^^ ^' (^^) 

m=0 n=0 \ ' fe''' / 

where ^ = g-^ ■ Zk,i, g € SL(2,C) and f^ i = {2nT /MN)f{z'^^i). 

Although projective characteristics must be derived in z-coordinates, in prac- 
tical image processing, (21 1 should be expressed in log-polar coordinates to be 
fast computable by FFT. To this end, let (u^„ „, 9'^ „) denote log-polar coordi- 
nates of „ = e",T,,™e'^™.". In these coordinates, ( [2l| is given by the following 
expression (see [511 El] for details) 

/c=0 J=0 

Thus, we can render image projective transformations in terms of projective 
Fourier transform of the original image only. 



6 DPFT in Computational Vision 

We discussed before the relevance of the conformal camera to the intermediate- 
level vision task of grouping image elements into individual objects in natural 
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scenes. Here we want to discuss the relevance of the data model of image repre- 
sentation based on projective Fourier analysis to image processing in computa- 
tional vision, including visual neuroscience and biologically motivated machine 
vision systems. 

6.1 Modeling the Retinotopy 

The mappings w — ln(z ± a) ~ In a, with a > and ±a indicating, for different 
signs, the left or right brain hemisphere, are accepted approximations of the 
topographic structure of primate primary visual cortex (VI) [54j . where the 
parameter a removes the singularity of the logarithm. However, the discrete 
projective Fourier transform (DPFT) that provides the data model for retinal 
image representation, can be efficiently computed by FFT only in log-polar co- 
ordinates given by the complex logarithm w — Inz, the mapping with distinctive 
rotational and zoom symmetries: 

ln(e'^2:) = In 2: -f i9, In(pz) = In z + In p. 

Thus, we see that the Schwartz model of the retina comes with drastic conse- 
quences; it destroys rotation and zoom symmetries. We also recall that FFT in 



log-polar coordinates does not have a singularity at the origin, see (12 1. 

The following facts support our modeling with DPFT. First, for small \z\ ^ 
a, ln(z ± a) — In a is approximately linear while, for large \z\ 3> a, it is dom- 
inated by Inz. Secondly, to construct discrete sampling for DPFT, the image 
was regularized by removing a disc representing the fovea (see previous section) . 
Thirdly, there is accumulated evidence pointing to the fact that the fovea and 
periphery have different functional roles in vision |511 1521 170] and likely involve 
different image processes. Finally, by the split theory of hemispherical image 
representation, which we mentioned before, the foveal region has a discontinu- 
ity along the vertical meridian, with each half processed in a different brain 
hemisphere |39j . We note that the two hemispheres are connected by a massive 
bridge of 500 million neuronal axons called the corpus callosum. 

We conclude this discussion with the following remarks: both models our 
and Schwartz' model in [SI] (see Fig. 5), as well as all other similar models, 
are, in fact, fovea-less models [67]. Furthermore, since the fovea is explicitly 
removed in our modeling, we expect to extend the present model to include 
foveal representation in the next stage of this modeling. In fact, the lack of the 
fovea in our modeling is one of the challenges that is stalling implementation of 
the model. We continue this discussion in Section 6.5.1. 

6.2 On Numerical Implementation of DPFT 

The DPFT approximation was obtained using the rectangular sampling grid 



{uk,Oi) in (15 1, corresponding, under the mapping. 
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Figure 5: (a) Schwartz model of the retina: the strip of width 2a is removed and 
two half-maps of In z are shifted to meet along the vertical meridian, (b) Our 
model: the fovea is removed and the retina is split along the vertical meridian, 
conforming to the split theory of the retino-cortical projection. 



to nonuniform sampling grid with equal sectors 

a = ei+,-ei = —, 1^0,1, ...,N-1 (22) 

and with ring radii increasing exponentially 

Pfc = rfe+i-rfe = e"''+i-e"'' =e"Me*-l)=^fe(e^-l), fc = 0,l,...,Af-l, (23) 

where S = Ufc+i — Uk- The radii = roe''^ are given in terms of the spacing 
S = and tq — Ta, where is the radius of the disc that has been removed to 



regularize logarithmic singularity, see (151 



Lets assume that we have been given a picture of the size A x B in pixel 
units, which is displayed with K dots per unit length (dpi). Then, the phys- 
ical dimensions, in the chosen unit of length, of the pixel and the picture are 
1/K X 1/K and A/K x B/K, respectively. Also, we assume that the retinal 
coordinates' origin (fixation) is the picture's center. 

The central disc of radius tq represents the fovea with a uniformly distributed 
of grid points and the number of the foveal pixels Nf given by tttq — Nf/K"^. 
This means that the fovea cannot increase the resolution, which is related to 
the distance of the picture from the eye. The number of sectors is obtained 
from the condition 27r(ro + ri)/2 N{l/K), where N — [ZnroK + tt]. Here 
[a] is the closest integer to a. To get the number of rings M, we assume that 
Po = fo{e^ — 1) = l/K and r?, = tm — roe*^''. We can take either rt = 
{l/K)unn{A,B)/2 or n = {\ / K)^J A^+ B^ /2. Thus, 5 = ln[(l + l/r^^K] and 
M={l/5)\n{n/ro). 

Example 2 We let A x B = 512 x 512 and K = 4 per mm, so the physical 
dimensions in mm are 128 xl28 and rb ~ 128%/2/2 — 90.5. Furthermore, we 
let Nf = 296, so tq = 2.427 and N = 64. Finally, S = ln(10.7084/9.7084) « 
0.09804 and (1/0.09804) ln(90.5/2.427) « Af = 37. The sampling grid consists 
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of points in polar coordinates: (rfe + pk+i/2,ei + 7r/64) = {2.552e''° °^^°^, {21 + 
l)7r/64) fc = 0, 1, 36, / = 0, 1, 63. 

In this example, the number of pixels in the original image is 262, 144, 
whereas the foveal (uniform sampling) and peripheral (log-polar sampling) rep- 
resentation of the image contain only 2, 664 pixels. 

We stress again that f^^; and fk.i are discretizations of the same image in 
different planes; fi^ i are the image samples in the image plane sampled on a 



nonuniform grid (e"'=e while the inverse DPFT output (18 1 gives the image 
samples f^^; on the uniform grid {uk,Oi), where Uk — Inr^. 

In summary, a simple description of the imaging model based on DPFT is as 
follows: an image (analog or digital) of a scene impinged on the retina is sampled 
on a nonuniform exp-polar grid, {rke^^'}MxN, that approximates the density 
distribution of retinal ganglion cells, giving the set of pixels {/fe.iljvxAf ^^^^ 
grid, the radial spacing changes exponentially: — VaC^'', k — 1,2, ...,M, and 
the angular spacing is constant: 9i = al, I ^ \,2, ...,N . As it was shown in Ex- 
ample 2, this sampling results in about 100 times less pixels than in the original 
image. To render {/fe,i}jvxAf' DPFT is formed and computed by FFT in 
log-polar coordinates {u}~,Oi) obtained by applying a complex logarithm as fol- 
lows: ln(rae'"'e'"') — \YiTa + 5k + ial = Uk + i9i, resulting in the set \ fki> 

L ' J MxN 

Next, the IPFT is assembled and computed again by FFT, this time giving the 
image samples fk,i — fk,i rendered in cortical (log-polar) coordinates {uk,Oi)- 



6.3 Relation to Other Numerical Approaches 

From the numerical approaches to foveate (or space-variant) vision, involving, 
for example, Fourier- Mellin transform or log-polar Hough transform, the most 
closely related to our work are results reported by Schwartz' group at Boston 
University. We note that the approximation of the retinotopy by a complex 
logarithm (see Section 6.1) was first proposed by Eric Schwartz in 1977. This 
group introduced the fast exponential chirp transform (FECT) fll in their 
attempt to develop numerical algorithms for space-variant image processing. 
Basically, both FECT and its inverse were obtained by the change of variables 
in both the spatial and frequency domains in the standard Fourier integrals. 
The discrete FECT was introduced somehow ad hoc, without references to nu- 
merical aspects of the approximation. Moreover, some basic components of 
Fourier analysis, such as underlying geometry or Plancherel measure was not 
considered. In comparison, projective Fourier transform (PFT) provides an effi- 
cient image representation well adapted to projective transformations produced 
in the conformal camera by the group SL(2, C) acting on the image plane by 
linear-fractional mappings. Significantly, PFT can be obtained by restricting 
geometric Fourier analysis of the Lie group SL(2,C) to the image plane of the 
conformal camera. Thus, the conformal camera comes with its own harmonic 
analysis. Moreover, PFT is computable by FFT in log-polar coordinates given 
by a complex logarithm that approximates the retinotopy. It implies that PFT 
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can integrate the head, eyes, and visual cortex into a single computational sys- 
tem. This aspect is discussed, with special attention to perisaccadic perception, 
in the remaining part of the paper. Another advantage of PFT is the complex 
(conformal) geometric analysis underlying the conformal camera. We discussed, 
in Section 4.1, the relation of this geometry to the intermediate-level vision prob- 
lem of grouping local contours into individual objects and the background of 
natural scenes. 

The other approaches to space-variant vision use the geometric transforma- 
tions, mainly based on a complex logarithmic function between the nonuniform 
(retinal) sampling grid and the uniform (cortical) grid for the purpose of devel- 
oping computer programs. These approaches can be classified into two different 
groups. The first group of problems deal with visualizing and classifying large 
information data sets. We give two examples for the first group. The first deals 
with the problem of mapping information space to the image space for navigation 
through complex two-dimensional data sets when viewing small details and at 
the same time the general overview [T^] . The second gives the model based image 
processing in mathematical morphology for qualifying/segmenting/quantifying 
spots topology in genomic microarray-based data |T]. The second group of prob- 
lems is related to robotic vision. We give only a few examples of such problems, 
which include tracking [7], navigation [5], detection salient regions [S7], and dis- 
parity estimation |42j . However, it seems that they share one common problem: 
high computational costs in the geometric transformation process. 

In the next figure, we show a simulation applied to Fig. 2 (a) with the 
software available over internet [8]. In Fig. 6, the San Diego skyline and har- 
bor shown in (a) is sampled in retinal exp-polar coordinates (with the vertical 
meridian deleted according to the split theory discussed before) and mapped by 
a complex logarithm transformation to rectangular log-polar coordinates (b). 
The inverse geometric transformation shown in (c) results in the retinal im- 
age that simulates the sampling by the ganglion cells density as a function of 
eccentricity. 

We note that the image processing presented here (see the last paragraph in 
the previous section) differs from the above simulation by one crucial aspect: we 
use projective Fourier analysis framework for image representation that provides 
low computational cost of the retino-cortical (logarithmic) transformation. 



6.4 DPFT and Binocular Vision 

In order to carry out numerical experiments with the discrete PFT, the con- 
formal camera should work in the following setup: we get a set of samples 
fk,i = /(e"''e'^') of an image / from a camera with anthropomorphic visual 
sensors [5j or an 'exp-polar' scanner with the sampling geometry similar to the 
distribution density of the retinal ganglion cells. Next, we form DPFT fkj ac- 
cording to ( 17 1 and compute it with FFT. Then, we compute IDPFT of fk,i 
given in (18 1, again with FFT. However this output from IDPFT renders the 
retinotopic image ffe_; of the retinal samples in cortical log-polar coordinates. 
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Figure 6: (a) San Diego skyline and harbor, (b) Its log-polar image, the vertical 

meridian deleted, obtained by the geometric transformation of both the polar 
samples with the radial partition changing exponentially and a constant angular 
partition, into regular samples in log-polar rectangular plane (c). 



This setup provides an efficient model that integrates the head, eyes, and the 
cortex into a single computational system, which is introduced next. 

We discuss this integrated system by assuming that a 3D scene consists of 
a gray square with a red bar located in front of it (see Fig. 7). The integrated 



Figure 7: The scene consisting of a gray square with a red bar in front of it is 
seen by an observer. The visual pathway with the major cortical areas is shown. 



binocular system with eyes modeled by the conformal cameras and this scene 
as seen from above is shown in Fig. 8. 

A simulation of the integrated binocular system with the grey square-red bar 
scene can be seen in Fig. 9. Each eye sees the scene from a different vantage 
point ((a) and (c) in Fig. 9), as the eyes are separated laterally. The retinal 
projections are sampled on the exp-polar grid with the meridian line removed 
as implied by the split theory. 

The retinotopic images are simulated in Matlab using the program from 
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Figure 8: The head-eyes- visual cortex integrated system. Following from the 
fact that eyes are modeled by the conformal camera, theoretical horopters axe 
conies that resemble empirical horopters. 



[8], and the cut-and-paste transformations are used to account for the global 
retinotopy topology. For example, the output from FFT computing the inverse 
DPFT of the scene projected on the right eye and sampled by ganglion cells is 
shown in Fig 10 (b). The cut-and-paste operation is applied to the output in 
Fig. 10 (b) and to the corresponding DPFT output of the left eye to obtain (f) 
in Fig. 9. 

6.5 Modeling Perisaccadic Perception with DPFT 

Because of acuity limitations of foveate vision, a sequence of fast eye rotations 
is necessary for processing the details of the scene by fixating eyes consecutively 
on the targets of interest. The sequence of fixations, saccades and smooth 
pursuits, called the scanpath, is the most basic feature of foveate vision (cf.. 
Fig 2). The fact that we do not see moving images on the retinas points to 
a poor integration of visual information across fixations during tasks such as 
reading, visual searching, or looking at a scene. Given the limited computational 
resources, it is critical that visual information is not only efficiently acquired 
during each fixation, but also that it is done without starting anew much of this 
acquisition process at each fixation. 

Although we are not aware of discontinuities in a scene perception when ex- 
ecuting a scanpath, this visual constancy is not perfect. In psychophysical lab- 
oratory experiments, the phenomenon of perisaccadic compression is observed: 
before the onset of the saccade, brief flashes are perceived by human subjects 
to be compressed around the impending saccade target |51l|33], see Fig. 11. 
However, perisaccadic perception experiments have revealed a multitude of mis- 
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Figure 9: In (a) and (c), the 3D scene from Fig. 8 is seen from a different 
vantage point by each eye (i.e., the conformal camera) due to eyes lateral dis- 
placement. The Matlab-simulated right and left retinal projections and the 
retinotopic image can be seen in (b), (d) and (f), respectively. 



localization phenomena, pointing to the involvement of many different neural 
processes. Accordingly, many different theories have been proposed, see [25]. 

Two computational theories of perisaccadic vision that have been proposed 
in visual neuroscience are related to our modeling. The first theory, suggested 
in [66], states that an efference copy generated by SC, a copy of an oculomotor 
command to rotate eyes in order to execute the saccade, is used to uniformly 
shift cortical neural activity representing spatial locations of the saccade target 
area toward foveal representation. It was proposed that this shift is reflected 
on the neuronal level by a transient spatial remapping of the receptive fields in 
numerous retinotopically organized cortical areas ([1H1|37|), including the supe- 
rior coUiculus (SC), parietal eye field (PEF), and frontal eye field (FEF). It can 
explain the perceived increase in spatial resolution around the saccade target as 
more foveal neurons are available there to process the details of objects. Fur- 
thermore, because the shift occurs in logarithmic coordinates that approximate 
retinotopy, the model can also explain perceived perisaccadic compression. 

The second theory, ^5], explains the perisaccadic compression by directing 
spatial attention to the target of a planned saccade. The proposed computa- 
tional model assumes that the initial stimulus neuronal activity in the visual 
cortical area is distorted by the feedback of the retinotopically organized ac- 
tivity hill of the saccade target in the oculomotor SC layer, what pushes the 
population response of the flashed stimulus in retinotopically organized cortical 
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Figure 10: (a) The simulation of the rigth eye's projected scene sampled by 
ganglion cells, (b) The retinotopic image of the sampled projection shown in 
(a); the vertical size corresponds to the lenght of the angular interval [— tt, tt] . 



areas (including PEF and FEF) towards the saccade target. This boost of per- 
formance at the target location of the saccade occurs immediately before the 
saccade onset increases spatial discrimination. The shift of the neuronal activ- 
ity in logarithmic coordinates, and hence perisaccadic compression, is a direct 
consequences of it. 

Because circuitry underlying receptive field remapping is widespread and 
not well understood, it cannot be easily decided whether saccadic remapping is 
the cause or consequence of saccadic compression. For example, only recently 
it was reported in |46j that a phenomenon very similar to the remapping occurs 
in extrastriate (V4, and, though progressively weaker, V3, V2 and VI) cortical 
areas in humans. Remarkably, remapping in extrastriate cortex could be func- 
tionally related to the integration of visual information from a constant object 
across saccades [53]. 

In this section, we model perisaccadic perception using the integrated binoc- 
ular system, addressing the process of presaccadic activity consisting shifts of 
neurons current receptive fields to their future postsaccadic locations, that is 
thought to underlie the scene remapping based on anticipated saccadic eye move- 
ment (efference copy) with the accompanied perisaccadic perceptual space com- 
pression. The postsaccadic activity during which actual integration of visual 
features takes place, will be considered in the next stage of our modeling. Al- 
though, our modeling directly conforms to the theory in [66], it may also be 
useful, on the image processing level, in representing the resulting receptive 
field shifts from 'attentional multiplicative gain field interaction' [3S], especially 
since the efficiency of the whole modeling, which must be repeated three times 
per second, was not addressed by the authors. 

We start here by supplementing the integrated binocular system presented 
in Section 6.4 with the most important subcortical and cortical pathways of the 
visuo-saccadic neural processes. These pathways depicted by arrowhead lines 
in Fig. 12, include the SC of the midbrain, which contains retinotopically or- 
ganized visual and oculomotor layers, the PEF, and the FEF in the parietal 
and frontal lobes of the neocortex (which themselves obtain inputs from many 
visual cortical areas) for assisting the SC in the control of the involuntary (PEF) 
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Figure 11: The spatial pattern of perisaccadis compression. It shows experi- 
mental data of the absolute mislocalization (lower row), reference to the true 
position of flashed dot randomly chosen from an array of 24 dots and four dif- 
ferent saccade amplitude (upper row). Adapted from [55] . 



and voluntary (FEF) saccades. We also include the interhemispheric pathways, 
the corpus callosum (about 500 million of neuronal axons connecting cerebral 
cortical hemispheres), and the intercoUicular commissure, because the coordi- 
nated movement of two eyes is a bihemispheric event. The motor commands 
that originate from the brain's major hemisphere (the left hemisphere for most 
right-handed people) travel across the corpus callosum to the minor hemisphere 
then down to brainstem, where part of it again crosses to the other side of 
the brain before both eyes are finally moved in coordination [17 . We believe 
that building on the existing models [21' and accelerating advances in visual 
neuroscience will soon allow the inclusion of these pathways such that the oper- 
ation of a more complete system of perisaccadic perception can be addressed in 
numerical modeling in a way that could be useful in neural engineering designs. 

The course of events taking place during perisaccadic perception, shown in 
Fig. 12, is as follows: the eyes are fixated at F and the new stimulus appears 
at T. The SC population T' at the retinotopic image of T (green spot in the 
left SC) calculates the position of the target T of an impending saccade. The 
SC also codes the motor command for the execution of the saccade. 

About 50 ms before the onset of the saccade, during the saccade (about 
30 ms), and about 50 ms after the saccade, the visual sensitivity is reduced 
and fiashes (dark blue dotes) around T are not perceived in veridical locations. 
Instead, a copy of the motor command (efference copy) is sent to translate the 
cortical image (light blue dots in VI) of flashes to remap it into a target-centered 
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Figure 12: The description is given in the text of the article. 



frame (red dotes in VI). 

This internal remapping results in the illusory compression of flashes, shown 
by red arrows. The compression is perceived around the incoming target T even 
though the eyes fixation is moving from F to T. The location of the cortical area 
of neural correlates of remapping is uncertain; it is required that this area is 
retinotopically organized. Although it could be PEF/FEF, here, for simplicity, 
this area is represented by VI. 

During the fixation of eyes at F, lasting on average about 300 ms, the image 
is sampled by ganglion cells fk,i = /(e^'^e*^') and its DPFT fk^i is computed 
by FFT in log-polar coordinates {uk,Oi) where Uk = Inrk- The inverse DPFT, 
computed again by FFT, gives a cortical image representation 

fM = fK,^/) = /(e"'=e'^') 
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where disparity-sensitive cells contribute to the building 3D understanding of 
the scene. In the same fixation period, the next saccade's target T is selected 
(PEF/FEF) and its position in respect to the fovea is calculated and converted 
into the motor command to move the eyes (SC). During that time interval 
of about 130 ms, the visual sensitivity is reduced, neural processes, using a 
copy of the eyes motor command (efference copy) , transiently shift the cortical 
image. In our modeling, this shift is generated using the shift property of Fourier 
transform as follows 

k=0 1=0 

where 6 is the corresponding spacing. It brings the presaccadic scene at F in 
fovea-centered coordinates into postsaccadic scene T in target-centered coordi- 
nates. However, 

compresses perceptual space. 

6.5.1 Challenges with Implementation 

There are some problems that must be addressed before we can implement 
our modeling of primate perception. One problem is due to the fact that the 
model of retinotopy is fovea-less. The other is related to the global topology of 
retinotopy, and in particular, to the vertical meridian split in retinas (and hence 
in the visual field) of the brain's hemispheric projections. 

In order to address the first problem, we need to develop a model of retino- 
topy that will include both foveal and peripheral regions. Hence, the projective 
Fourier transform that gives extrafoveal image representation must be comple- 
mented with a transform for the foveal image representation. Two different 
transforms, foveal and extrafoveal, could conform to the accumulated evidence 
indicating that the fovea and periphery have different functional roles in vision 
and may have visual processing differences [5TJ |7D] . Maybe the simplest way 
to construct the foveal image transform is restricting the group SL(2,C) action 
(which gives both image projective transformations and Mobius geometry), to 
Euclidean or affine subgroups. We refer to Section 3.2 in where Euclidean 
Fourier transform is introduced in the framework of representation theory to 
motivate the construction of PFT and can be seen as its 'restriction' to the 
Euclidean subgroup of SL(2,C). The affine subgroup could bring the wavelet 
transform to supplement PFT. 

The second problem involves two facts that are not compatible with each 
other: the computation of DPFT of an image in log-polar coordinates by FFT 
and the foveal split along the vertical meridian and partial crossing that re- 
organizes the retina outputs so that the left hemisphere destinations receive 
information from the right visual field, and the right hemisphere destinations 
receive information from the left visual field. The retina (that is, the image 
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plane of the conformal camera) with the foveal disc removed has the visual 
field representad by an annulus, which under the complex logarithm w = Inz 
is mapped into a rectangle. In order to discretize PFT, this rectangle must be 
extended periodically, which forces a quasiperiodic extension of the annulus, see 
Eq. 21 in |58j . In our numerical experiments with the image translation by the 
corresponding shift property of DPFT, the image 'disappeared' into the foveal 
region of the cortical area (the foveal region of the retina) to reappear from the 
opposite side of the rectangle (opposite circular boundary of the annulus) . Also, 
we need to modify the FFT to account for the global retinotopy simulated in 
Fig. 9 (f) by the cut-and-paste transformations. Clearly, the two problems are 
interdependent . 

7 Conclusions 

In this article we presented a comprehensive account of our approach to com- 
putational vision developed over the last decade. It was done by bringing in 
one place physiological and behavioral aspects of primate visual perception and 
the conformal camera's computational harmonic analysis with the underlying 
geometry. This allowed us to discuss remarkable advantages that the conformal 
camera possesses over other cameras used in computational vision. First, the 
conformal camera geometry fully accounts for the basic concepts of co-circularity 
and scale invariance employed by human vision system in solving the difficult 
intermediate-level vision problems of grouping local elements into individual 
objects of natural scenes. Second, the conformal camera has its own harmonic 
analysis — projective Fourier analysis — for image representation and process- 
ing that is well adapted to image projective transformations and the retino- 
topic mapping of the brain visual and oculomotor pathways. Projective Fourier 
analysis integrates the binocular model consisting of the head, eyes (conformal 
cameras), and the visual cortex into a single computational system. Based on 
this binocular system, we proposed a model of the perisaccadic perception, in- 
cluding perisaccadic mislocalizations observed in laboratory experiments. More 
precisely, we modeled the presaccadic activity, which, through shifts of neurons 
current receptive fields to their future postsaccadic locations, is thought to un- 
derlie remapping based on anticipated saccadic eye movement (efference copy). 
The postsaccadic activity, during which the actual integration of visual features 
takes place, will be considered in the next stage of our modeling. 

Finally, we presented numerous challenges with the implementation of our 
modeling. First, the fovea-less model of the retina, based on the discrete pro- 
jective Fourier transform (DPFT) of an image, must be supplemented with the 
foveal image transform. Second, the computations of the DPFT with a fast 
Fourier transform algorithm (FFT) has to be modified in order to account for 
the global retinotopy of the brain visual pathway. 

It was observed that saccades cause, not only a compression of space, but 
also of time [17] . In order to preserve visual stability during the saccadic scan- 
path, receptive fields undergo a fast remapping at the time of saccades. When 
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the speed of this remapping approaches the physical Umit of neural informa- 
tion transfer, relativistic-like effects are psychophysiologically observed and may 
cause space-time compression [ill HH] • Curiously, this suggestion can also be 
accounted for in our model based on projective Fourier analysis since the group 
of image projective transformations in the conformal camera is the double cover 
of the group of Lorentz transformations of Einstein's special relativity. 
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